Welcome and Introduction
Introduction to Descriptive Statistics
Methods of Displaying Data
Prof. Dr. Constantinos Antoniou
Chair of Transportation Systems Engineering
c.antoniou@tum.de
Tuesday, October 18, 2022
Applied Statistics in Transport
Prof. Dr. C. Antoniou
c.antoniou@tum.de
Practical information - Lecturers
Mohamed Abouelela
mohamed.abouelela@tum.de
18/10/2022Prof. Dr. Constantinos Antoniou | Applied Statistics in Transport
1
Course Topics
1. Introduction to descriptive statistics
2. Methods of displaying data
3. Probability theory and important distributions
4. Confidence intervals and sample sizes
5. Statistical testing/ hypothesis testing
6. Correlation and regression
18/10/2022Prof. Dr. Constantinos Antoniou | Applied Statistics in Transport
2
Credits
Some lectures rely on material from Prof. Haris N.
Koutsopoulos (Northeastern University), Prof. Petros
Vythoulkas, and the book Washington, Karlaftis and Mannering
(2003, 2009)
18/10/2022Prof. Dr. Constantinos Antoniou | Applied Statistics in Transport
3
BACKGROUND AND MOTIVATION
18/10/2022Prof. Dr. Constantinos Antoniou | Applied Statistics in Transport
4
Why Study
Probability
and Statistics?
18/10/2022Prof. Dr. Constantinos Antoniou | Applied Statistics in Transport
5
https://xkcd.com/936/
18/10/2022Prof. Dr. Constantinos Antoniou | Applied Statistics in Transport
6
Uncertainty
Values are not the same under the same conditions
Peak traffic flows
Annual rainfalls
Steel yield strengths
911 emergency calls
Number of people served at a bank window
Variability
Important implications for
Decision making
Design
Operations
Tools for studying and dealing with uncertainty
Probability and statistics
18/10/2022Prof. Dr. Constantinos Antoniou | Applied Statistics in Transport
7
Wait/Walk dilemma
Waiting for a bus at a stop
Duration of the wait may exceed the time to walk
to your destination
2008 "Year in Ideas, The New York Times
Magazine
Thompson, Clive (2008-12-13). "The Bus-Wait
Formula"
Wikipedia
https://en.wikipedia.org/wiki/Wait/walk_dilemma
18/10/2022Prof. Dr. Constantinos Antoniou | Applied Statistics in Transport
8
Source: http://www.pindropsecurity.com/data-science-how-do-we-get-started-part-one/
Explosion in Data Availability
18/10/2022Prof. Dr. Constantinos Antoniou | Applied Statistics in Transport
9
Information has gone from scarce to
superabundant. That brings huge new
benefits but also big headaches.
Economist, Feb. 2010
Explosion in Data Availability
Source: TomTom
18/10/2022Prof. Dr. Constantinos Antoniou | Applied Statistics in Transport
10
Challenges
Data can be very noisy
Measurement errors
Other sources
18/10/2022Prof. Dr. Constantinos Antoniou | Applied Statistics in Transport
11
Big Data the three (four, five, …) Vs
Volume:
Increasingly massive datasets hard to manage
Large Hadron Collider experiment, 150 million
sensors delivering data 40 million times per
second.
Variety:
Data complexity is growing
More types of data captured than ever before,
quantification of self etc.
18/10/2022Prof. Dr. Constantinos Antoniou | Applied Statistics in Transport
12
Big Data the three (four, five, …) Vs
Velocity:
Some data is arriving so rapidly it must be either processed
instantly or lost
Whole subfield of ‘streaming data’
Veracity?
Value?
18/10/2022Prof. Dr. Constantinos Antoniou | Applied Statistics in Transport
13
Impact of Big Data
Big Data promises to revolutionize numerous
areas:
Big science:
ØPersonalized genomics
ØMeteorology
Entertainment:
Ø Netflix recommender system, $1,000,000 challenge
to improve system
Ø Hit show ‘House of Cards’ designed based on
analysis
18/10/2022Prof. Dr. Constantinos Antoniou | Applied Statistics in Transport
14
18/10/2022Prof. Dr. Constantinos Antoniou | Applied Statistics in Transport
15
Machine Learning
The massive size of Big Data sets are too large
for a human to analyze
Require computers that can learn the
structure and patterns in the data to extract
meaningful insights and applications
Machine learning and Big Data are
inextricably linked
ML hard to define: contains Elements of
Artificial Intelligence, Statistics, Computer
Science, Control Theory, Engineering
18/10/2022Prof. Dr. Constantinos Antoniou | Applied Statistics in Transport
16
So what?
How can data help plan, manage and operate transportation
systems?
18/10/2022Prof. Dr. Constantinos Antoniou | Applied Statistics in Transport
17
Skills needed in
data science
[National
Institute of
Standards (NIST)]
Source: NIST Big Data. "Draft NIST Big Data Interoperability Framework, Volume 1", 2014.
http://docplayer.net/7239072-Draft-nist-big-data-interoperability-framework-volume-1-definitions.html.
18/10/2022Prof. Dr. Constantinos Antoniou | Applied Statistics in Transport
18
Data Science
Source: https://en.wikipedia.org/wiki/Data_science
18/10/2022Prof. Dr. Constantinos Antoniou | Applied Statistics in Transport
19
Data Science
18/10/2022Prof. Dr. Constantinos Antoniou | Applied Statistics in Transport
20
CRISP-DM Process
Model for Data Mining
Source:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.198.5133&rep=rep1&type=pdf
18/10/2022Prof. Dr. Constantinos Antoniou | Applied Statistics in Transport
21
Source: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.198.5133&rep=rep1&type=pdf
CRISP-DM Tasks and their Outputs
18/10/2022Prof. Dr. Constantinos Antoniou | Applied Statistics in Transport
22
The Human Centered KDD Process and the SEMMA
Methodological Steps
Source: Mariscal, Gonzalo, Oscar Marban,
and Covadonga Fernandez. "A survey of data
mining and knowledge discovery process
models and methodologies."The Knowledge
Engineering Review 25.02 (2010): 137-166
18/10/2022Prof. Dr. Constantinos Antoniou | Applied Statistics in Transport
23
The SEMMA Model Development Process
Source: http://www.sas.com/content/dam/SAS/en_gb/doc/other1/events/sasforum/slides/manchester-
day2/I.%20Brown%20Data%20Exploration%20and%20Visualisation%20in%20SAS%20EM_IB.pdf
18/10/2022Prof. Dr. Constantinos Antoniou | Applied Statistics in Transport
24
Guide to Analytic Selection
(Booz Allen & Hamilton)
Source: http://www.boozallen.com/insights/2015/12/data-science-field-guide-second-edition.
18/10/2022Prof. Dr. Constantinos Antoniou | Applied Statistics in Transport
25
Degree of Intelligence in Data Analytics
Source: Adapted from: Davenport, Thomas H., and Jeanne G. Harris. Competing on analytics: The new
science of winning. Harvard Business Press, 2007
Analysis
18/10/2022Prof. Dr. Constantinos Antoniou | Applied Statistics in Transport
26
Data preparation
Our data has to follow our assumptions for x and y
All sorts of little tasks
Parse datasets
Convert value types (e.g. numeric to nominal)
Eliminate errors, (useless) outliers
Obtain intermediate values (e.g. x
n+1
=f(x
1
,x
2
))
Descriptive statistics
This is where we spend MOST of the time! Some people say
90%...
18/10/2022Prof. Dr. Constantinos Antoniou | Applied Statistics in Transport
27
Data analysts - the bad news
18/10/2022Prof. Dr. Constantinos Antoniou | Applied Statistics in Transport
28
Data analysts - the “good” news
18/10/2022Prof. Dr. Constantinos Antoniou | Applied Statistics in Transport
29